linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Andreas Dilger <adilger@dilger.ca>, Jan Kara <jack@suse.cz>,
	linux-ext4@vger.kernel.org
Subject: Re: [PATCH 2/3] ext4: Speedup ext4 orphan inode handling
Date: Mon, 20 Apr 2015 14:34:00 +0200	[thread overview]
Message-ID: <20150420123400.GG3117@quack.suse.cz> (raw)
In-Reply-To: <20150418011300.GE11592@birch.djwong.org>

On Fri 17-04-15 18:13:00, Darrick J. Wong wrote:
> On Fri, Apr 17, 2015 at 05:53:03PM -0600, Andreas Dilger wrote:
> > On Apr 16, 2015, at 9:42 AM, Jan Kara <jack@suse.cz> wrote:
> > > 
> > > Ext4 orphan inode handling is a bottleneck for workloads which heavily
> > > truncate / unlink small files since it contends on the global
> > > s_orphan_mutex lock (and generally it's difficult to improve scalability
> > > of the ondisk linked list of orphaned inodes).
> > > 
> > > This patch implements new way of handling orphan inodes. Instead of
> > > linking orphaned inode into a linked list, we store it's inode number in
> > > a new special file which we call "orphan file". Currently we still
> > > protect the orphan file with a spinlock for simplicity but even in this
> > > setting we can substantially reduce the length of the critical section
> > > and thus speedup some workloads.
> > > 
> > > Note that the change is backwards compatible when the filesystem is
> > > clean - the existence of the orphan file is a compat feature, we set
> > > another ro-compat feature indicating orphan file needs scanning for
> > > orphaned inodes when mounting filesystem read-write. This ro-compat
> > > feature gets cleared on unmount / remount read-only.
> > > 
> > > Some performance data from 48 CPU Xeon Server with 32 GB of RAM,
> > > filesystem located on ramdisk, average of 5 runs:
> > > 
> > > stress-orphan (microbenchmark truncating files byte-by-byte from N
> > > processes in parallel)
> > > 
> > > Threads Time            Time
> > >        Vanilla         Patched
> > >  1       1.602800        1.260000
> > >  2       4.292200        2.455000
> > >  4       6.202800        3.848400
> > >  8      10.415000        6.833000
> > > 16      18.933600       12.883200
> > > 32      38.517200       25.342200
> > > 64      79.805000       50.918400
> > > 128     159.629200      102.666000
> > > 
> > > reaim new_fserver workload (tweaked to avoid calling sync(1) after every
> > > operation)
> > > 
> > > Threads Jobs/s          Jobs/s
> > >        Vanilla         Patched
> > >  1      24375.00        22941.18
> > > 25     162162.16       278571.43
> > > 49     222209.30       331626.90
> > > 73     280147.60       419447.52
> > > 97     315250.00       481910.83
> > > 121     331157.90       503360.00
> > > 145     343769.00       489081.08
> > > 169     355549.56       519487.68
> > > 193     356518.65       501800.00
> > > 
> > > So in both cases we see significant wins all over the board.
> > 
> > One thing I noticed looking at this patch is that there is quite a bit
> > of orphan handling code now.  Is it worthwhile to move it into its own
> > file and make super.c a bit smaller?
> > 
> > > Signed-off-by: Jan Kara <jack@suse.cz>
> > > ---
> > > fs/ext4/ext4.h  |  52 +++++++++++--
> > > fs/ext4/namei.c |  95 +++++++++++++++++++++--
> > > fs/ext4/super.c | 237 ++++++++++++++++++++++++++++++++++++++++++++++++--------
> > > 3 files changed, 341 insertions(+), 43 deletions(-)
> > > 
> > > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> > > index abed83485915..768a8b9ee2f9 100644
> > > --- a/fs/ext4/ext4.h
> > > +++ b/fs/ext4/ext4.h
> > > @@ -208,6 +208,7 @@ struct ext4_io_submit {
> > > #define EXT4_UNDEL_DIR_INO	 6	/* Undelete directory inode */
> > > #define EXT4_RESIZE_INO		 7	/* Reserved group descriptors inode */
> > > #define EXT4_JOURNAL_INO	 8	/* Journal inode */
> > > +#define EXT4_ORPHAN_INO		 9	/* Inode with orphan entries */
> > 
> > Contrary to your patch description that said this was using ino #12,
> > this conflicts with EXT2_EXCLUDE_INO for snapshots.  Why not use
> > s_last_orphan in the superblock to reference the orphan inode?  Since
> > this feature already requires a new EXT2_FEATURE_RO_COMPAT flag, the
> > existing orphan inode number could be reused.  See below how this could
> > still in the ENOSPC case.
> > 
> > > +static inline int ext4_inodes_per_orphan_block(struct super_block *sb)
> > > +{
> > > +	/* We reserve 1 entry for block checksum */
> > 
> > Would be good to improve this comment to say "first entry" or "last entry".
> > 
> > > +	return sb->s_blocksize / sizeof(u32) - 1;
> > > +}
> 
> Just to be clear, the format of each orphaned inode block is ... an array of
> 32-bit inode numbers with a 32-bit checksum at the end?
  Yes.

> Shouldn't we have a magic number somewhere for positive identification?
  We could use another slot at the end of block for magic number. The
checksum actually uniquely identifies that the block belongs to the orphan
file because it has the inode number in it but it's so much easier to look
into the block and see the magic there during disaster recovery / bug
analysis that I guess it's worth the extra space.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

  reply	other threads:[~2015-04-20 12:34 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-16 15:42 [PATCH 0/3 RFC] ext4: Speedup orphan file handling Jan Kara
2015-04-16 15:42 ` [PATCH 1/3] ext4: Support for checksumming from journal triggers Jan Kara
2015-04-17 19:00   ` Andreas Dilger
2015-04-20  9:07     ` Jan Kara
2015-04-16 15:42 ` [PATCH 2/3] ext4: Speedup ext4 orphan inode handling Jan Kara
     [not found]   ` <CAOQ4uxifVr1swHb5Y2M-TRuzwdDo-z92G6PuHvBGecGZ7nYuHg@mail.gmail.com>
2015-04-17  6:09     ` Amir Goldstein
2015-04-17  7:15     ` Jan Kara
2015-04-17 22:21       ` Andreas Dilger
2015-04-17 23:53   ` Andreas Dilger
2015-04-18  1:13     ` Darrick J. Wong
2015-04-20 12:34       ` Jan Kara [this message]
2015-04-20 12:25     ` Jan Kara
2015-04-20 16:35       ` Andreas Dilger
2015-04-21 10:56         ` Jan Kara
2015-04-21 15:46           ` Andreas Dilger
2015-04-18 23:53   ` Theodore Ts'o
2015-04-20  9:32     ` Jan Kara
2015-04-16 15:42 ` [PATCH 3/3] ext4: Improve scalability of ext4 orphan file handling Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150420123400.GG3117@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=adilger@dilger.ca \
    --cc=darrick.wong@oracle.com \
    --cc=linux-ext4@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).