From: Jan Kara <jack@suse.cz>
To: linux-ext4@vger.kernel.org
Cc: Jan Kara <jack@suse.cz>
Subject: [PATCH 0/3 RFC] ext4: Speedup orphan file handling
Date: Thu, 16 Apr 2015 17:42:54 +0200 [thread overview]
Message-ID: <1429198977-5637-1-git-send-email-jack@suse.cz> (raw)
Hello,
orphan inode handling in ext4 is a bottleneck for workloads which heavily
excercise truncate / unlink of small files as they contend on global
s_orphan_mutex (when you have fast enough storage). This patch set implements
new way of handling orphan inodes - instead using a linked list, we store inode
numbers of orphaned inodes in a file which is possible to implement in a more
scalable manner than linked list manipulations. See description of patch 2/3
for more details.
The patch set achieves significant gains both for a micro benchmark stressing
orphan inode handling (truncating file byte-by-byte, several threads in
parallel) and for reaim new_fserver workload. As a highlight, microbenchmark
runtime for 128 threads is reduced from original 160 s down to 71 s, which
is also the time it takes the benchmark to run when orphan inode handling
is completely disabled. For full numbers you can check commit logs of
patches 2/3 and 3/3. You can also check my presentation from Vault at
http://events.linuxfoundation.org/sites/events/files/slides/ext4-scaling.pdf
for graphs from tests.
I'm happy for any review, thoughts, ideas about the patches.
The kernel part of the feature is complete, the thing missing is support for
enabling the feature in mke2fs and tune2fs. Since I need one reserved inode
(currently that's hacked up and I simply use inode number 12 for simplicity
of testing) that depends on how exactly we decide to deal with the issue that
we ran out of old limit on reserved inodes.
1) I can implement support in tune2fs to increase s_first_ino by moving inodes
that are allocated in the range we want reserved. Then we can just continue
to use reserved inodes as we did previously. I kind of like this for its
simplicity, no need for ondisk format change, no need for kernel changes.
2) Implement "system directory" for reserved inodes as we spoke at Ext4 meeting
at LSF.
But before I spend time on this, I'd like to hear some thoughts on how to
deal with reserved inodes from other developers...
Honza
next reply other threads:[~2015-04-16 15:43 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-16 15:42 Jan Kara [this message]
2015-04-16 15:42 ` [PATCH 1/3] ext4: Support for checksumming from journal triggers Jan Kara
2015-04-17 19:00 ` Andreas Dilger
2015-04-20 9:07 ` Jan Kara
2015-04-16 15:42 ` [PATCH 2/3] ext4: Speedup ext4 orphan inode handling Jan Kara
[not found] ` <CAOQ4uxifVr1swHb5Y2M-TRuzwdDo-z92G6PuHvBGecGZ7nYuHg@mail.gmail.com>
2015-04-17 6:09 ` Amir Goldstein
2015-04-17 7:15 ` Jan Kara
2015-04-17 22:21 ` Andreas Dilger
2015-04-17 23:53 ` Andreas Dilger
2015-04-18 1:13 ` Darrick J. Wong
2015-04-20 12:34 ` Jan Kara
2015-04-20 12:25 ` Jan Kara
2015-04-20 16:35 ` Andreas Dilger
2015-04-21 10:56 ` Jan Kara
2015-04-21 15:46 ` Andreas Dilger
2015-04-18 23:53 ` Theodore Ts'o
2015-04-20 9:32 ` Jan Kara
2015-04-16 15:42 ` [PATCH 3/3] ext4: Improve scalability of ext4 orphan file handling Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1429198977-5637-1-git-send-email-jack@suse.cz \
--to=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).