linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Theodore Ts'o <tytso@mit.edu>,
	Anand Avati <anand.avati@gmail.com>,
	Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>,
	sandeen@redhat.com, linux-nfs@vger.kernel.org,
	linux-ext4@vger.kernel.org, gluster-devel@nongnu.org
Subject: Re: [Gluster-devel] regressions due to 64-bit ext4 directory cookies
Date: Fri, 15 Feb 2013 13:27:38 +1100	[thread overview]
Message-ID: <20130215022738.GD10731@dastard> (raw)
In-Reply-To: <20130214220110.GE8343@fieldses.org>

On Thu, Feb 14, 2013 at 05:01:10PM -0500, J. Bruce Fields wrote:
> On Thu, Feb 14, 2013 at 05:10:02PM +1100, Dave Chinner wrote:
> > On Wed, Feb 13, 2013 at 05:20:52PM -0500, Theodore Ts'o wrote:
> > > Telldir() and seekdir() are basically implementation horrors for any
> > > file system that is using anything other than a simple array of
> > > directory entries ala the V7 Unix file system or the BSD FFS.  For any
> > > file system which is using a more advanced data structure, like
> > > b-trees hash trees, etc, there **can't** possibly be a "offset" into a
> > > readdir stream. 
> > 
> > I'll just point you to this:
> > 
> > http://marc.info/?l=linux-ext4&m=136081996316453&w=2
> > 
> > so you can see that XFS implements what you say can't possibly be
> > done. ;)
> > 
> > FWIW, that post only talked about the data segment. I didn't mention
> > that XFS has 2 other segments in the directory file (both beyond
> > EOF) for the directory data indexes. One contains the name-hash btree
> > index used for name based lookups and the other contains a freespace
> > index for tracking free space in the data segment.
> 
> OK, so in some sense that reduces the problem to that of implementing
> readdir cookies for directories that are stored in a simple linear
> array.

*nod*

> Which I should know how to do but I don't: I guess all you need is a
> provision for making holes on remove (so that you aren't required move
> existing entries, messing up offsets for concurrent readers)?

Exactly.

The data segment is a virtual mapping that is maintained by the
extent tree, so we can simply punch holes in it for directory blocks
that are empty and no longer referenced. i.e. the data segement
really is just a sparse file.

The result of doing block mapping this way is that the freespace
tracking segment actually only needs to track space in partially
used blocks. Hence we only need to allocate new blocks when the
freespace map empties, And we work out where to allocate the new
block in the virtual map by doing an extent tree lookup to find the
first hole....

> Purely out of curiosity: is there a more detailed writeup of XFS's
> directory format?  (Or a pointer to a piece of the code a person could
> understand without losing a month to it?)

Not really. There's documentation of the on-disk structures, but
it's a massive leap from there to understanding the structure and
how it all ties together.  I've been spending the past couple of
months deep in the depths of the XFS directory code so how it all
works is front-and-center in my brain right now...

That said, the thought had crossed my mind that there's a a couple
of LWN articles/conference talks I could put together as a brain
dump. ;)

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2013-02-15  2:27 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-02-12 20:28 regressions due to 64-bit ext4 directory cookies J. Bruce Fields
2013-02-12 20:56 ` Bernd Schubert
2013-02-12 21:00   ` J. Bruce Fields
2013-02-13  8:17     ` Bernd Schubert
2013-02-13 22:18       ` J. Bruce Fields
2013-02-13 13:31     ` [Gluster-devel] " Niels de Vos
2013-02-13 15:40       ` Bernd Schubert
2013-02-14  5:32         ` Dave Chinner
2013-02-13  4:00 ` Theodore Ts'o
2013-02-13 13:31   ` J. Bruce Fields
2013-02-13 15:14     ` Theodore Ts'o
2013-02-13 15:19       ` J. Bruce Fields
2013-02-13 15:36         ` Theodore Ts'o
     [not found]           ` <20130213153654.GC17431-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2013-02-13 16:20             ` J. Bruce Fields
     [not found]               ` <20130213162059.GL14195-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2013-02-13 16:43                 ` Myklebust, Trond
2013-02-13 21:33                   ` J. Bruce Fields
2013-02-14  3:59                     ` Myklebust, Trond
     [not found]                       ` <4FA345DA4F4AE44899BD2B03EEEC2FA91F3D6BAB-UCI0kNdgLrHLJmV3vhxcH3OR4cbS7gtM96Bgd4bDwmQ@public.gmane.org>
2013-02-14  5:45                         ` Dave Chinner
2013-02-13 21:21                 ` Anand Avati
     [not found]                   ` <CAFboF2wXvP+vttiff8iRE9rAgvV8UWGbFprgVp8p7kE43TU=PA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-02-13 22:20                     ` [Gluster-devel] " Theodore Ts'o
2013-02-13 22:41                       ` J. Bruce Fields
     [not found]                         ` <20130213224141.GU14195-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2013-02-13 22:47                           ` Theodore Ts'o
     [not found]                             ` <20130213224720.GE5938-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2013-02-13 22:57                               ` Anand Avati
     [not found]                                 ` <CAFboF2z1akN_edrY_fT915xfehfHGioA2M=PSHv0Fp3rD-5v5A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-02-13 23:05                                   ` [Gluster-devel] " J. Bruce Fields
     [not found]                                     ` <20130213230511.GW14195-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2013-02-13 23:44                                       ` Theodore Ts'o
     [not found]                                         ` <20130213234430.GF5938-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2013-02-14  0:05                                           ` Anand Avati
     [not found]                                             ` <CAFboF2zS+YAa0uUxMFUAbqgPh3Kb4xZu40WUjLyGn8qPoP+Oyw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-02-14 21:47                                               ` [Gluster-devel] " J. Bruce Fields
2013-03-26 15:23                                               ` Bernd Schubert
     [not found]                                                 ` <5151BD5F.30607-mPn0NPGs4xGatNDF+KUbs4QuADTiUCJX@public.gmane.org>
2013-03-26 15:48                                                   ` [Gluster-devel] " Eric Sandeen
2013-03-28 14:07                                                     ` Theodore Ts'o
2013-03-28 16:26                                                       ` Eric Sandeen
2013-03-28 17:52                                                       ` Zach Brown
     [not found]                                                         ` <20130328175205.GD16651-fypN+1c5dIyjpB87vu3CluTW4wlIGRCZ@public.gmane.org>
2013-03-28 18:05                                                           ` Anand Avati
     [not found]                                                             ` <CAFboF2ztc06G00z8ga35NrxgnT2YgBiDECgU_9kvVA_Go1_Bww-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-03-28 18:31                                                               ` [Gluster-devel] " J. Bruce Fields
     [not found]                                                                 ` <20130328183153.GG7080-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org>
2013-03-28 18:49                                                                   ` Anand Avati
     [not found]                                                                     ` <CAFboF2w49Lc0vM0SerbJfL9_RuSHgEU+y_Yk7F4pLxeiqu+KRg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-03-28 19:43                                                                       ` [Gluster-devel] " Jeff Darcy
     [not found]                                                                         ` <51549D74.1060703-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-03-28 22:14                                                                           ` Anand Avati
     [not found]                                                                             ` <CAFboF2xkvXx9YFYxBXupwg=s=3MaeQYm2KK2m8MFtEBPsxwQ7Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-03-28 22:20                                                                               ` Anand Avati
2013-02-14 21:46                                           ` [Gluster-devel] " J. Bruce Fields
     [not found]                       ` <20130213222052.GD5938-AKGzg7BKzIDYtjvyW6yDsg@public.gmane.org>
2013-02-14  6:10                         ` Dave Chinner
2013-02-14 22:01                           ` J. Bruce Fields
2013-02-15  2:27                             ` Dave Chinner [this message]
2013-02-13  6:56 ` Andreas Dilger
2013-02-13 13:40   ` J. Bruce Fields

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130215022738.GD10731@dastard \
    --to=david@fromorbit.com \
    --cc=anand.avati@gmail.com \
    --cc=bernd.schubert@itwm.fraunhofer.de \
    --cc=bfields@fieldses.org \
    --cc=gluster-devel@nongnu.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=sandeen@redhat.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).