linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Theodore Ts'o <tytso@mit.edu>
To: Andreas Dilger <adilger@dilger.ca>
Cc: "Niklas Hambüchen" <niklas@nh2.me>,
	"Linux FS-devel Mailing List" <linux-fsdevel@vger.kernel.org>,
	"Paul Eggert" <eggert@cs.ucla.edu>,
	"Jim Meyering" <jim@meyering.net>,
	"Pádraig Brady" <P@draigbrady.com>
Subject: Re: O(n^2) deletion performance
Date: Mon, 1 Jan 2018 23:33:54 -0500	[thread overview]
Message-ID: <20180102043354.GH2532@thunk.org> (raw)
In-Reply-To: <3FF0FC46-28A8-406C-A528-6F900735BEA1@dilger.ca>

On Mon, Jan 01, 2018 at 07:49:55PM -0700, Andreas Dilger wrote:
> 
> At one time we discussed to change inode number allocation to be
> piecewise linear for inodes within the same directory.  As a directory
> grows larger, it could grab a chunk of free inodes in the inode table,
> then map the new filename hash to the range of free inodes, and use
> that when selecting the new inode number.  As that chunk is filled up,
> a new, larger chunk of free inodes would be selected and new filenames
> would be mapped into the new chunk.

Well, it's not so simple.  Remember that there are only 16 inodes per
4k inode table block.  And only 32,768 inodes per block group.  In the
workloads discussed in the coreutils bug, there are 1 million to 32
million files being created in a single directory.  At that point,
unless we start doing truly ridiculous readaheads in the inode table,
the disk is going to be randomly seeking no matter what you do.  Also,
if we try to stay strictly within the inodes in the block group,
assuming that the average file size is larger than 4k --- generally a
safe bet --- this will tend to separate the data blocks from the
inodes, which will increase latency when reading files.

And most of the time, optimizing for reading files makes sense (e.g.,
think /usr/include), because that tends to happen more often than rm
-rf workloads.

The bottom line is this gets tricky, especially if the directory is
dynamically growing and shrinking.  Now you might start deleting from
the first chunk, and start allocating from the second chunk, or maybe
the third chunk.  The bottom line is that it's relatively easy to
optimize for specific workloads --- but even if this is a good idea
--- "rm -rf" of zero-length files is not the first workload I would be
hyper-optimizing for.

Cheers,

					- Ted

  parent reply	other threads:[~2018-01-02  4:34 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-02  0:21 O(n^2) deletion performance Niklas Hambüchen
2018-01-02  1:20 ` Niklas Hambüchen
2018-01-02  1:59 ` Theodore Ts'o
2018-01-02  2:49   ` Andreas Dilger
2018-01-02  4:27     ` Jim Meyering
2018-01-02  6:22       ` Theodore Ts'o
2018-01-04  4:16         ` Jim Meyering
2018-01-04  7:16           ` Theodore Ts'o
2018-01-04 11:42           ` Dave Chinner
2018-01-02  4:33     ` Theodore Ts'o [this message]
2018-01-02  4:54 ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180102043354.GH2532@thunk.org \
    --to=tytso@mit.edu \
    --cc=P@draigbrady.com \
    --cc=adilger@dilger.ca \
    --cc=eggert@cs.ucla.edu \
    --cc=jim@meyering.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=niklas@nh2.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).