linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3 RFC] ext4: Speedup orphan file handling
@ 2015-04-16 15:42 Jan Kara
  2015-04-16 15:42 ` [PATCH 1/3] ext4: Support for checksumming from journal triggers Jan Kara
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Jan Kara @ 2015-04-16 15:42 UTC (permalink / raw)
  To: linux-ext4; +Cc: Jan Kara

  Hello,

orphan inode handling in ext4 is a bottleneck for workloads which heavily
excercise truncate / unlink of small files as they contend on global
s_orphan_mutex (when you have fast enough storage). This patch set implements
new way of handling orphan inodes - instead using a linked list, we store inode
numbers of orphaned inodes in a file which is possible to implement in a more
scalable manner than linked list manipulations. See description of patch 2/3
for more details.

The patch set achieves significant gains both for a micro benchmark stressing
orphan inode handling (truncating file byte-by-byte, several threads in
parallel) and for reaim new_fserver workload. As a highlight, microbenchmark
runtime for 128 threads is reduced from original 160 s down to 71 s, which
is also the time it takes the benchmark to run when orphan inode handling
is completely disabled. For full numbers you can check commit logs of
patches 2/3 and 3/3. You can also check my presentation from Vault at
http://events.linuxfoundation.org/sites/events/files/slides/ext4-scaling.pdf
for graphs from tests.

I'm happy for any review, thoughts, ideas about the patches.

The kernel part of the feature is complete, the thing missing is support for
enabling the feature in mke2fs and tune2fs. Since I need one reserved inode
(currently that's hacked up and I simply use inode number 12 for simplicity
of testing) that depends on how exactly we decide to deal with the issue that
we ran out of old limit on reserved inodes.

1) I can implement support in tune2fs to increase s_first_ino by moving inodes
   that are allocated in the range we want reserved. Then we can just continue
   to use reserved inodes as we did previously. I kind of like this for its
   simplicity, no need for ondisk format change, no need for kernel changes.

2) Implement "system directory" for reserved inodes as we spoke at Ext4 meeting
   at LSF.

But before I spend time on this, I'd like to hear some thoughts on how to
deal with reserved inodes from other developers...

								Honza

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2015-04-21 15:46 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-04-16 15:42 [PATCH 0/3 RFC] ext4: Speedup orphan file handling Jan Kara
2015-04-16 15:42 ` [PATCH 1/3] ext4: Support for checksumming from journal triggers Jan Kara
2015-04-17 19:00   ` Andreas Dilger
2015-04-20  9:07     ` Jan Kara
2015-04-16 15:42 ` [PATCH 2/3] ext4: Speedup ext4 orphan inode handling Jan Kara
     [not found]   ` <CAOQ4uxifVr1swHb5Y2M-TRuzwdDo-z92G6PuHvBGecGZ7nYuHg@mail.gmail.com>
2015-04-17  6:09     ` Amir Goldstein
2015-04-17  7:15     ` Jan Kara
2015-04-17 22:21       ` Andreas Dilger
2015-04-17 23:53   ` Andreas Dilger
2015-04-18  1:13     ` Darrick J. Wong
2015-04-20 12:34       ` Jan Kara
2015-04-20 12:25     ` Jan Kara
2015-04-20 16:35       ` Andreas Dilger
2015-04-21 10:56         ` Jan Kara
2015-04-21 15:46           ` Andreas Dilger
2015-04-18 23:53   ` Theodore Ts'o
2015-04-20  9:32     ` Jan Kara
2015-04-16 15:42 ` [PATCH 3/3] ext4: Improve scalability of ext4 orphan file handling Jan Kara

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).