All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Oliver Sang <oliver.sang@intel.com>
Cc: Dave Chinner <dchinner@redhat.com>,
	oe-lkp@lists.linux.dev, lkp@intel.com,
	linux-kernel@vger.kernel.org,
	"Darrick J. Wong" <djwong@kernel.org>,
	linux-xfs@vger.kernel.org, ying.huang@intel.com,
	feng.tang@intel.com, fengwei.yin@intel.com
Subject: Re: [linus:master] [xfs]  2edf06a50f:  fsmark.files_per_sec -5.7% regression
Date: Sat, 13 May 2023 09:05:04 +1000	[thread overview]
Message-ID: <20230512230504.GF3223426@dread.disaster.area> (raw)
In-Reply-To: <ZF3uXe+cjAsfCLic@xsang-OptiPlex-9020>

On Fri, May 12, 2023 at 03:44:29PM +0800, Oliver Sang wrote:
> hi, Dave Chinner,
> 
> On Tue, May 09, 2023 at 05:10:53PM +1000, Dave Chinner wrote:
> > On Tue, May 09, 2023 at 04:54:33PM +1000, Dave Chinner wrote:
> > > On Tue, May 09, 2023 at 10:13:19AM +0800, kernel test robot wrote:
> > > > 
> > > > 
> > > > Hello,
> > > > 
> > > > kernel test robot noticed a -5.7% regression of fsmark.files_per_sec on:
> > > > 
> > > > 
> > > > commit: 2edf06a50f5bbe664283f3c55c480fc013221d70 ("xfs: factor xfs_alloc_vextent_this_ag() for  _iterate_ags()")
> > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > > 
> > > This is just a refactoring patch and doesn't change any logic.
> > > Hence I'm sceptical that it actually resulted in a performance
> > > regression. Indeed, the profile indicates a significant change of
> > > behaviour in the allocator and I can't see how the commit above
> > > would cause anything like that.
> > > 
> > > Was this a result of a bisect? If so, what were the original kernel
> > > versions where the regression was detected?
> > 
> > Oh, CONFIG_XFS_DEBUG=y, which means:
> > 
> > static int
> > xfs_alloc_ag_vextent_lastblock(
> >         struct xfs_alloc_arg    *args,
> >         struct xfs_alloc_cur    *acur,
> >         xfs_agblock_t           *bno,
> >         xfs_extlen_t            *len,
> >         bool                    *allocated)
> > {
> >         int                     error;
> >         int                     i;
> > 
> > #ifdef DEBUG
> >         /* Randomly don't execute the first algorithm. */
> >         if (get_random_u32_below(2))
> >                 return 0;
> > #endif
> > 
> > We randomly chose a near block allocation strategy to use to improve
> > code coverage, not the optimal one for IO performance. Hence the CPU
> > usage and allocation patterns that impact IO performance are simply
> > not predictable or reproducable from run to run. So, yeah, trying to
> > bisect a minor difference in performance as a result of this
> > randomness will not be reliable....
> 
> Thanks a lot for guidance!
> 
> we plan to disable XFS_DEBUG (as well as XFS_WARN) in our performance tests.
> want to consult with you if this is the correct thing to do?

You can use XFS_WARN=y with performance tests - that elides all the
debug specific code that changes behaviour but leaves all the
ASSERT-based correctness checks in the code.

> and I guess we should still keep them in functional tests, am I right?

Yes.

> BTW, regarding this case, we tested again with disabling XFS_DEBUG (as well as
> XFS_WARN), kconfig is attached, only diff with last time is:
> -CONFIG_XFS_DEBUG=y
> -CONFIG_XFS_ASSERT_FATAL=y
> +# CONFIG_XFS_WARN is not set
> +# CONFIG_XFS_DEBUG is not set
> 
> but we still observed similar regression:
> 
> ecd788a92460eef4 2edf06a50f5bbe664283f3c55c4
> ---------------- ---------------------------
>          %stddev     %change         %stddev
>              \          |                \
>    8176057 ± 15%      +4.7%    8558110        fsmark.app_overhead
>      14484            -6.3%      13568        fsmark.files_per_sec

So the application spent 5% more CPU time in userspace, and the rate
the kernel processed IO went down by 6%. Seems to me like
everything is running slower, not just the kernel code....

>     100.50 ±  5%      +0.3%     100.83        turbostat.Avg_MHz
>       5.54 ± 11%      +0.3        5.82        turbostat.Busy%
>       1863 ± 19%      -6.9%       1733        turbostat.Bzy_MHz

Evidence that the CPU is running at a 7% lower clock rate when the
results are 6% slower is a bit suspicious to me. Shouldn't the CPU
clock rate be fixed to the same value for A-B performance regression
testing?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2023-05-12 23:05 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-09  2:13 [linus:master] [xfs] 2edf06a50f: fsmark.files_per_sec -5.7% regression kernel test robot
2023-05-09  6:54 ` Dave Chinner
2023-05-09  7:10   ` Dave Chinner
2023-05-12  7:44     ` Oliver Sang
2023-05-12 23:05       ` Dave Chinner [this message]
2023-05-14 14:36         ` Feng Tang
2023-05-15 16:57           ` Darrick J. Wong
2023-05-15 22:20           ` Dave Chinner
2023-05-16  2:46             ` Feng Tang
2023-05-16  3:07               ` Zhang, Rui

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230512230504.GF3223426@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=dchinner@redhat.com \
    --cc=djwong@kernel.org \
    --cc=feng.tang@intel.com \
    --cc=fengwei.yin@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=lkp@intel.com \
    --cc=oe-lkp@lists.linux.dev \
    --cc=oliver.sang@intel.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.