From: Dmitry Monakhov <dmonakhov@openvz.org>
To: linux-ext4@vger.kernel.org
Cc: tytso@mit.edu
Subject: Re: [PATCH] ext4: improve smp scalability for inode generation
Date: Wed, 18 Oct 2017 21:08:21 +0300 [thread overview]
Message-ID: <87376gpbvu.fsf@openvz.org> (raw)
In-Reply-To: <8760bcpdc8.fsf@openvz.org>
[-- Attachment #1: Type: text/plain, Size: 1489 bytes --]
Dmitry Monakhov <dmonakhov@openvz.org> writes:
> ->s_next_generation is protected by s_next_gen_lock but it usage
> pattern is very primitive and can be replaced with atomic_ops
>
> This significantly improve creation/unlink scenario on SMP systems,
> for example lat_fs_create_unlink test [1] on x2 E5-2680 (32vcpu) system
> shows ~20% improvement.
> | nr_tsk | wo/ patch | w/ patch |
> |--------+-----------+----------|
> | 1 | 137 | 140 |
> | 2 | 224 | 233 |
> | 4 | 356 | 372 |
> | 8 | 439 | 519 |
> | 16 | 443 | 585 |
> | 32 | 598 | 695 |
> | 64 | 559 | 707 |
> | 128 | 385 | 437 |
FYI with lazytime enabled lat_fs_create_unlink is ~16x times slower.
The reason is quite obvious ext4_update_other_inodes_time() increase
lock contention for inode_hash_lock (4k/256) times.
->ext4_do_update_inode
->ext4_update_other_inodes_time
for (i = 0; i < inodes_per_block; i++, ino++, buf += inode_size)
->find_inode_nowait
->spin_lock(&inode_hash_lock) -> 16x contention increase
inode_hash_lock is known problem. I have patches to convert inode_hash_table
per bucket lock similar to dentry_hash, but this require massige changes in
various filesystems so will require a lot of time to be merged.
Currently lazytime amplify it significantly. May be it is reasonable to
use spin_trylock inside find_inode_nowait to make it true lightweight hint?
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: lazytime_trylock.patch --]
[-- Type: text/x-patch, Size: 410 bytes --]
diff --git a/fs/inode.c b/fs/inode.c
index d1e35b5..a5b1cba1 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1360,7 +1360,9 @@ struct inode *find_inode_nowait(struct super_block *sb,
struct inode *inode, *ret_inode = NULL;
int mval;
- spin_lock(&inode_hash_lock);
+ if (!spin_trylock(&inode_hash_lock))
+ return NULL;
+
hlist_for_each_entry(inode, head, i_hash) {
if (inode->i_sb != sb)
continue;
next prev parent reply other threads:[~2017-10-18 18:04 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-18 17:36 [PATCH] ext4: improve smp scalability for inode generation Dmitry Monakhov
2017-10-18 18:08 ` Dmitry Monakhov [this message]
2017-10-19 11:50 ` Andreas Dilger
2017-11-09 3:23 ` Theodore Ts'o
2017-11-10 17:33 ` Dmitry Monakhov
2017-11-10 22:57 ` Theodore Ts'o
2017-11-10 22:39 ` Andreas Dilger
2017-11-10 22:55 ` Andreas Dilger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87376gpbvu.fsf@openvz.org \
--to=dmonakhov@openvz.org \
--cc=linux-ext4@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.