From: Dave Chinner <david@fromorbit.com>
To: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Subject: [RFC PATCH 0/3] vfs: convert inode cache to hlist-bl
Date: Tue, 6 Apr 2021 22:33:40 +1000 [thread overview]
Message-ID: <20210406123343.1739669-1-david@fromorbit.com> (raw)
Hi folks,
Recently I've been doing some scalability characterisation of
various filesystems, and one of the limiting factors that has
prevented me from exploring filesystem characteristics is the
inode hash table. namely, the global inode_hash_lock that protects
it.
This has long been a problem, but I personally haven't cared about
it because, well, XFS doesn't use it and so it's not a limiting
factor for most of my work. However, in trying to characterise the
scalability boundaries of bcachefs, I kept hitting against VFS
limitations first. bcachefs hits the inode hash table pretty hard
and it becaomse a contention point a lot sooner than it does for
ext4. Btrfs also uses the inode hash, but it's namespace doesn't
have the capability to stress the indoe hash lock due to it hitting
internal contention first.
Long story short, I did what should have been done a decade or more
ago - I converted the inode hash table to use hlist-bl to split up
the global lock. This is modelled on the dentry cache, with one
minor tweak. That is, the inode hash value cannot be calculated from
the inode, so we have to keep a record of either the hash value or a
pointer to the hlist-bl list head that the inode is hashed into so
taht we can lock the corect list on removal.
Other than that, this is mostly just a mechanical conversion from
one list and lock type to another. None of the algorithms have
changed and none of the RCU behaviours have changed. But it removes
the inode_hash_lock from the picture and so performance for bcachefs
goes way up and CPU usage for ext4 halves at 16 and 32 threads. At
higher thread counts, we start to hit filesystem and other VFS locks
as the limiting factors. Profiles and performance numbers are in
patch 3 for those that are curious.
I've been running this in benchmarks and perf testing across
bcachefs, btrfs and ext4 for a couple of weeks, and it passes
fstests on ext4 and btrfs without regressions. So now it needs more
eyes and testing and hopefully merging....
Cheers,
Dave.
next reply other threads:[~2021-04-06 12:33 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-06 12:33 Dave Chinner [this message]
2021-04-06 12:33 ` [PATCH 1/3] vfs: factor out inode hash head calculation Dave Chinner
2021-04-06 22:06 ` Kent Overstreet
2021-04-06 12:33 ` [PATCH 2/3] hlist-bl: add hlist_bl_fake() Dave Chinner
2021-04-06 22:07 ` Kent Overstreet
2021-04-06 12:33 ` [PATCH 3/3] vfs: inode cache conversion to hash-bl Dave Chinner
2021-04-06 13:28 ` bl_list and lockdep Matthew Wilcox
2021-04-06 21:22 ` Dave Chinner
2021-04-12 15:20 ` Thomas Gleixner
2021-04-12 22:15 ` Dave Chinner
2021-04-12 23:18 ` Thomas Gleixner
2021-04-13 9:58 ` Dave Chinner
2021-04-13 21:24 ` Thomas Gleixner
2021-04-06 22:16 ` [PATCH 3/3] vfs: inode cache conversion to hash-bl Kent Overstreet
2021-04-06 22:03 ` [RFC PATCH 0/3] vfs: convert inode cache to hlist-bl Kent Overstreet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210406123343.1739669-1-david@fromorbit.com \
--to=david@fromorbit.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox