Re: [ext4] 05c2c00f37: aim7.jobs-per-min -11.8% regression

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jan Kara <jack@suse.cz>
To: lkp@lists.01.org
Subject: Re: [ext4] 05c2c00f37: aim7.jobs-per-min -11.8% regression
Date: Mon, 31 May 2021 18:57:46 +0200	[thread overview]
Message-ID: <20210531165746.GA2610@quack2.suse.cz> (raw)
In-Reply-To: <20210525092205.GA4112@quack2.suse.cz>

[-- Attachment #1: Type: text/plain, Size: 2337 bytes --]

On Tue 25-05-21 11:22:05, Jan Kara wrote:
> On Fri 21-05-21 12:42:16, Theodore Y. Ts'o wrote:
> > On Fri, May 21, 2021 at 11:27:30AM +0200, Jan Kara wrote:
> > > 
> > > OK, thanks for testing. So the orphan code is indeed the likely cause of
> > > this regression but I probably did not guess correctly what is the
> > > contention point there. Then I guess I need to reproduce and do more
> > > digging why the contention happens...
> > 
> > Hmm... what if we only recalculate the superblock checksum when we do
> > a commit, via the callback function from the jbd2 layer to file
> > system?
> 
> I actually have to check whether the regression is there because of the
> additional locking of the buffer_head (because that's the only thing that
> was added to that code in fact, adding some atomic instructions, bouncing
> another cacheline) or because of the checksum computation that moved from
> ext4_handle_dirty_super() closer to actual superblock update under those
> locks.

So I did a few experiments on my test machine. I saw the biggest regression
for creat_clo workload for 7 threads. The results look like:

                         orig                   patched                hack1                  hack2
Hmean     creat_clo-7    36458.33 (   0.00%)    23836.55 * -34.62%*    32608.70 * -10.56%*    37300.18 (   2.31%)

where hack1 means I've removed the lock_buffer() calls from orphan handling
code and hack2 means I've additionally moved checksum recalculation from
under orphan lock. Take the numbers with a grain of salt as they are rather
variable and this is just an average of 5 runs but the tendency is pretty
clear. Both these changes contribute to the regression significantly,
additional locking of the buffer head contributes somewhat more.

I will see how various variants of reducing the contention look like (e.g.
if just using bh lock for everything helps at all). But honestly I don't
want to jump through too big hoops just for this workload - the orphan list
contention is pretty pathological here and if we seriously care about
workload like this we should rather revive the patchset with hashed orphan
list I wrote couple years back... That was able to give like 3x speedup to
workloads like this.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

WARNING: multiple messages have this Message-ID (diff)

From: Jan Kara <jack@suse.cz>
To: "Theodore Y. Ts'o" <tytso@mit.edu>
Cc: Jan Kara <jack@suse.cz>,
	Xing Zhengjun <zhengjun.xing@linux.intel.com>,
	kernel test robot <oliver.sang@intel.com>,
	LKML <linux-kernel@vger.kernel.org>,
	lkp@lists.01.org, lkp@intel.com
Subject: Re: [LKP] [ext4] 05c2c00f37: aim7.jobs-per-min -11.8% regression
Date: Mon, 31 May 2021 18:57:46 +0200	[thread overview]
Message-ID: <20210531165746.GA2610@quack2.suse.cz> (raw)
In-Reply-To: <20210525092205.GA4112@quack2.suse.cz>

On Tue 25-05-21 11:22:05, Jan Kara wrote:
> On Fri 21-05-21 12:42:16, Theodore Y. Ts'o wrote:
> > On Fri, May 21, 2021 at 11:27:30AM +0200, Jan Kara wrote:
> > > 
> > > OK, thanks for testing. So the orphan code is indeed the likely cause of
> > > this regression but I probably did not guess correctly what is the
> > > contention point there. Then I guess I need to reproduce and do more
> > > digging why the contention happens...
> > 
> > Hmm... what if we only recalculate the superblock checksum when we do
> > a commit, via the callback function from the jbd2 layer to file
> > system?
> 
> I actually have to check whether the regression is there because of the
> additional locking of the buffer_head (because that's the only thing that
> was added to that code in fact, adding some atomic instructions, bouncing
> another cacheline) or because of the checksum computation that moved from
> ext4_handle_dirty_super() closer to actual superblock update under those
> locks.

So I did a few experiments on my test machine. I saw the biggest regression
for creat_clo workload for 7 threads. The results look like:

                         orig                   patched                hack1                  hack2
Hmean     creat_clo-7    36458.33 (   0.00%)    23836.55 * -34.62%*    32608.70 * -10.56%*    37300.18 (   2.31%)

where hack1 means I've removed the lock_buffer() calls from orphan handling
code and hack2 means I've additionally moved checksum recalculation from
under orphan lock. Take the numbers with a grain of salt as they are rather
variable and this is just an average of 5 runs but the tendency is pretty
clear. Both these changes contribute to the regression significantly,
additional locking of the buffer head contributes somewhat more.

I will see how various variants of reducing the contention look like (e.g.
if just using bh lock for everything helps at all). But honestly I don't
want to jump through too big hoops just for this workload - the orphan list
contention is pretty pathological here and if we seriously care about
workload like this we should rather revive the patchset with hashed orphan
list I wrote couple years back... That was able to give like 3x speedup to
workloads like this.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

next prev parent reply	other threads:[~2021-05-31 16:57 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-27 12:08 [ext4] 05c2c00f37: aim7.jobs-per-min -11.8% regression kernel test robot
2021-02-27 12:08 ` kernel test robot
2021-05-20  7:13 ` Xing Zhengjun
2021-05-20  9:51   ` Jan Kara
2021-05-20  9:51     ` [LKP] " Jan Kara
2021-05-21  1:16     ` Xing Zhengjun
2021-05-21  1:16       ` [LKP] " Xing Zhengjun
2021-05-21  9:27       ` Jan Kara
2021-05-21  9:27         ` [LKP] " Jan Kara
2021-05-21 16:42         ` Theodore Y. Ts'o
2021-05-21 16:42           ` [LKP] " Theodore Y. Ts'o
2021-05-25  9:22           ` Jan Kara
2021-05-25  9:22             ` [LKP] " Jan Kara
2021-05-25 17:15             ` Theodore Ts'o
2021-05-25 17:15               ` [LKP] " Theodore Ts'o
2021-05-31 16:57             ` Jan Kara [this message]
2021-05-31 16:57               ` Jan Kara
2021-06-03 16:10               ` Jan Kara
2021-06-03 16:10                 ` [LKP] " Jan Kara
2021-09-03  5:28                 ` Xing Zhengjun
2021-09-03  5:28                   ` [LKP] " Xing Zhengjun
2021-09-03 12:32                   ` Theodore Ts'o
2021-09-03 12:32                     ` [LKP] " Theodore Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210531165746.GA2610@quack2.suse.cz \
    --to=jack@suse.cz \
    --cc=lkp@lists.01.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.