All of lore.kernel.org
 help / color / mirror / Atom feed
From: Theodore Ts'o <tytso@mit.edu>
To: Jim Meyering <jim@meyering.net>
Cc: "Andreas Dilger" <adilger@dilger.ca>,
	"Niklas Hambüchen" <niklas@nh2.me>,
	"Linux FS-devel Mailing List" <linux-fsdevel@vger.kernel.org>,
	"Paul Eggert" <eggert@cs.ucla.edu>,
	"Pádraig Brady" <P@draigbrady.com>
Subject: Re: O(n^2) deletion performance
Date: Thu, 4 Jan 2018 02:16:55 -0500	[thread overview]
Message-ID: <20180104071655.GG23371@thunk.org> (raw)
In-Reply-To: <CA+8g5KFGDG6=R3vY4jvWuSZjap4sCxrSnK1sfTsPnjjJzbYEQw@mail.gmail.com>

On Wed, Jan 03, 2018 at 08:16:58PM -0800, Jim Meyering wrote:
> 
> Still wondering how this happened... deliberate optimization for
> something else, probably.
> And wishing I'd written a relative (not absolute) test for it in 2008,
> so I would have noticed sooner.
> In 2008 when I wrote this coreutils extN performance test:
> 
>   https://git.savannah.gnu.org/cgit/coreutils.git/tree/tests/rm/ext3-perf.sh
> 
> there was no O(N^2) or even "just" O(N^1.5) component when using the
> then-just-improved rm. Many of us plotted the curves.

How high (how many files) did you plot the curves back in 2008?  Do
you remember?

> Any idea when ext4's unlink became more expensive?

Ext4's unlink hasn't really changed.  What *has* changed over time is
the block and inode allocation algorithms.  2008 is before we
optimized allocation algorithms to read/write operations to large
files more efficient, and to try to avoid free space fragmention as
the file system aged.

Given that on large / fast NVMe device everything looks linear, it's
pretty clear that what you're seeing is essentially seek time on
spinning rust, and probably SSD GC overhead on flash devices.

> I've just run a test on the spinning-disk file system mentioned above,
> and it took 75 minutes to delete 12.8M entries. That's rather nasty.

Yes, but how long did it take to *create* the 12.8M entries?  My guess
is that it was roughly comparable?

							- Ted

  reply	other threads:[~2018-01-04  7:17 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-02  0:21 O(n^2) deletion performance Niklas Hambüchen
2018-01-02  1:20 ` Niklas Hambüchen
2018-01-02  1:59 ` Theodore Ts'o
2018-01-02  2:49   ` Andreas Dilger
2018-01-02  4:27     ` Jim Meyering
2018-01-02  6:22       ` Theodore Ts'o
2018-01-04  4:16         ` Jim Meyering
2018-01-04  7:16           ` Theodore Ts'o [this message]
2018-01-04 11:42           ` Dave Chinner
2018-01-02  4:33     ` Theodore Ts'o
2018-01-02  4:54 ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180104071655.GG23371@thunk.org \
    --to=tytso@mit.edu \
    --cc=P@draigbrady.com \
    --cc=adilger@dilger.ca \
    --cc=eggert@cs.ucla.edu \
    --cc=jim@meyering.net \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=niklas@nh2.me \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.