From: Jeff Layton <jlayton@redhat.com>
To: lkp@lists.01.org
Subject: Re: [lkp-robot] [iversion] c0cef30e4f: aim7.jobs-per-min -18.0% regression
Date: Tue, 27 Feb 2018 08:29:09 -0500 [thread overview]
Message-ID: <1519738149.4300.45.camel@redhat.com> (raw)
In-Reply-To: <8b48844f-7f9a-a9d7-b5bc-3bc403e0fa78@intel.com>
[-- Attachment #1: Type: text/plain, Size: 6517 bytes --]
On Tue, 2018-02-27 at 15:42 +0800, kemi wrote:
>
> On 2018年02月26日 20:33, Jeff Layton wrote:
> > On Mon, 2018-02-26 at 06:43 -0500, Jeff Layton wrote:
> > > On Mon, 2018-02-26 at 16:38 +0800, Ye Xiaolong wrote:
> > > > On 02/25, Jeff Layton wrote:
> > > > > On Sun, 2018-02-25 at 23:05 +0800, kernel test robot wrote:
> > > > > > Greeting,
> > > > > >
> > > > > > FYI, we noticed a -18.0% regression of aim7.jobs-per-min due to commit:
> > > > > >
> > > > > >
> > > > > > commit: c0cef30e4ff0dc025f4a1660b8f0ba43ed58426e ("iversion: make inode_cmp_iversion{+raw} return bool instead of s64")
> > > > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > > > > >
> > > > > > in testcase: aim7
> > > > > > on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz with 384G memory
> > > > > > with following parameters:
> > > > > >
> > > > > > disk: 4BRD_12G
> > > > > > md: RAID0
> > > > > > fs: xfs
> > > > > > test: disk_src
> > > > > > load: 3000
> > > > > > cpufreq_governor: performance
> > > > > >
> > > > > > test-description: AIM7 is a traditional UNIX system level benchmark suite which is used to test and measure the performance of multiuser system.
> > > > > > test-url: https://sourceforge.net/projects/aimbench/files/aim-suite7/
> > > > > >
> > > > > >
> > > > >
> > > > > I'm a bit suspicious of this result.
> > > > >
> > > > > This patch only changes inode_cmp_iversion{+raw} (since renamed to
> > > > > inode_eq_iversion{+raw}), and that neither should ever be called from
> > > > > xfs. The patch is fairly trivial too, and I wouldn't expect a big
> > > > > performance hit.
> > > >
> > > > I tried to queue 4 more times test for both commit c0cef30e4f and its parent,
> > > > the result seems quite stable.
> > > >
> > > > c0cef30e4ff0dc025f4a1660b8f0ba43ed58426e:
> > > > "aim7.jobs-per-min": [
> > > > 32964.01,
> > > > 32938.68,
> > > > 33068.18,
> > > > 32886.32,
> > > > 32843.72,
> > > > 32798.83,
> > > > 32898.34,
> > > > 32952.55
> > > > ],
> > > >
> > > > 3da90b159b146672f830bcd2489dd3a1f4e9e089:
> > > > "aim7.jobs-per-min": [
> > > > 40239.65,
> > > > 40163.33,
> > > > 40353.32,
> > > > 39976.9,
> > > > 40185.75,
> > > > 40411.3,
> > > > 40213.58,
> > > > 39900.69
> > > > ],
> > > >
> > > > Any other test data you may need?
> > > >
> > > > >
> > > > > Is IMA involved here at all? I didn't see any evidence of it, but the
> > > > > kernel config did have it enabled.
> > > > >
> > > >
> > > > Sorry, not quite familiar with IMA, could you tell more about how to check it?
> > > >
> > >
> > > Thanks for retesting it, but I'm at a loss for why we're seeing this:
> > >
> > > IMA is the the integrity management subsystem. It will use the iversion
> > > field to determine whether to remeasure files during remeasurement. It
> > > looks like the kernel config has it enabled, but it doesn't look like
> > > it's in use, based on the info in the initial report.
> > >
> > > This patch only affects two inlined functions inode_cmp_iversion and
> > > inode_cmp_iversion_raw. The patch is pretty trivial (as Linus points
> > > out). These functions are only called from IMA and fs-specific code
> > > (usually in readdir implementations to detect directory changes).
> > >
> > > XFS does not call either of these functions however, so I'm a little
> > > unclear on how this patch could slow anything down on this test. The
> > > only thing I can think to do here would be to profile this and see what
> > > stands out.
> > >
> > > Note that we do need to keep this in perspective too. This 18%
> > > regression on this test follows around a ~230% improvement that occurred
> > > when we merged the bulk of these patches. It's should still be quite a
> > > bit faster than the v4.15 in this regard.
> > >
> > > Still, it'd be good to understand what's going on here.
> > >
> > >
> >
> > Could we see the dmesg from this boot? It'd be good to confirm that IMA
> > is not involved here, as that's the only place that I can see that would
> > call into this code at all here.
> >
>
> See attachment for info on dmesg/perf-profile/compare_result.
> Feel free to let Xiaolong or me know if anything else you would like to check.
>
Many thanks,
Only one caller of the functions touched by this patch shows up in the
profiles: ima_file_free. That calls ima_check_last_writer, which calls
inode_cmp_iversion. The lines from the profiles show:
3da90b159b146672f830bcd2489dd3a1f4e9e089:
0.00% 0.00% [kernel.kallsyms] [k] ima_file_free
c0cef30e4ff0dc025f4a1660b8f0ba43ed58426e:
0.01% 0.01% [kernel.kallsyms] [k] ima_file_free
Seems pretty insignificant, but perhaps that is somehow accounting for
the difference. This is called when a file is freed so there could be an
effect I guess if we're closing a lot of files for write.
Looking at the disassembly from the builds on my box there is some
slight difference, since we did alter the implementation.
inode->iversion is 0x150 bytes into the struct on my builds:
3da90b159b146672f830bcd2489dd3a1f4e9e089:
0xffffffff813ae858 <+136>: je 0xffffffff813ae871 <ima_file_free+161>
0xffffffff813ae85a <+138>: mov 0x150(%rbp),%rsi
0xffffffff813ae861 <+145>: mov 0x20(%rax),%rcx
0xffffffff813ae865 <+149>: and $0xfffffffffffffffe,%rsi
0xffffffff813ae869 <+153>: add %rcx,%rcx
0xffffffff813ae86c <+156>: cmp %rcx,%rsi
0xffffffff813ae86f <+159>: je 0xffffffff813ae899 <ima_file_free+201>
c0cef30e4ff0dc025f4a1660b8f0ba43ed58426e:
0xffffffff813ae828 <+136>: je 0xffffffff813ae83a <ima_file_free+154>
0xffffffff813ae82a <+138>: mov 0x150(%rbp),%rcx
0xffffffff813ae831 <+145>: shr %rcx
0xffffffff813ae834 <+148>: cmp %rcx,0x20(%rax)
0xffffffff813ae838 <+152>: je 0xffffffff813ae862 <ima_file_free+194>
The patched one looks like it ought to be more efficient. It's certainly
fewer instructions and it doesn't touch %rsi now. Are shifts more
expensive?
In any case, what might be good at this point as a sanity check is to
turn off CONFIG_IMA and recheck this. That should tell us whether we're
on the right track here. With that disabled, this patch should have no
effect on anything at all.
Thanks,
--
Jeff Layton <jlayton@redhat.com>
WARNING: multiple messages have this Message-ID (diff)
From: Jeff Layton <jlayton@redhat.com>
To: kemi <kemi.wang@intel.com>, Ye Xiaolong <xiaolong.ye@intel.com>
Cc: David Howells <dhowells@redhat.com>,
lkp@01.org, Linus Torvalds <torvalds@linux-foundation.org>,
LKML <linux-kernel@vger.kernel.org>
Subject: Re: [LKP] [lkp-robot] [iversion] c0cef30e4f: aim7.jobs-per-min -18.0% regression
Date: Tue, 27 Feb 2018 08:29:09 -0500 [thread overview]
Message-ID: <1519738149.4300.45.camel@redhat.com> (raw)
In-Reply-To: <8b48844f-7f9a-a9d7-b5bc-3bc403e0fa78@intel.com>
On Tue, 2018-02-27 at 15:42 +0800, kemi wrote:
>
> On 2018年02月26日 20:33, Jeff Layton wrote:
> > On Mon, 2018-02-26 at 06:43 -0500, Jeff Layton wrote:
> > > On Mon, 2018-02-26 at 16:38 +0800, Ye Xiaolong wrote:
> > > > On 02/25, Jeff Layton wrote:
> > > > > On Sun, 2018-02-25 at 23:05 +0800, kernel test robot wrote:
> > > > > > Greeting,
> > > > > >
> > > > > > FYI, we noticed a -18.0% regression of aim7.jobs-per-min due to commit:
> > > > > >
> > > > > >
> > > > > > commit: c0cef30e4ff0dc025f4a1660b8f0ba43ed58426e ("iversion: make inode_cmp_iversion{+raw} return bool instead of s64")
> > > > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> > > > > >
> > > > > > in testcase: aim7
> > > > > > on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz with 384G memory
> > > > > > with following parameters:
> > > > > >
> > > > > > disk: 4BRD_12G
> > > > > > md: RAID0
> > > > > > fs: xfs
> > > > > > test: disk_src
> > > > > > load: 3000
> > > > > > cpufreq_governor: performance
> > > > > >
> > > > > > test-description: AIM7 is a traditional UNIX system level benchmark suite which is used to test and measure the performance of multiuser system.
> > > > > > test-url: https://sourceforge.net/projects/aimbench/files/aim-suite7/
> > > > > >
> > > > > >
> > > > >
> > > > > I'm a bit suspicious of this result.
> > > > >
> > > > > This patch only changes inode_cmp_iversion{+raw} (since renamed to
> > > > > inode_eq_iversion{+raw}), and that neither should ever be called from
> > > > > xfs. The patch is fairly trivial too, and I wouldn't expect a big
> > > > > performance hit.
> > > >
> > > > I tried to queue 4 more times test for both commit c0cef30e4f and its parent,
> > > > the result seems quite stable.
> > > >
> > > > c0cef30e4ff0dc025f4a1660b8f0ba43ed58426e:
> > > > "aim7.jobs-per-min": [
> > > > 32964.01,
> > > > 32938.68,
> > > > 33068.18,
> > > > 32886.32,
> > > > 32843.72,
> > > > 32798.83,
> > > > 32898.34,
> > > > 32952.55
> > > > ],
> > > >
> > > > 3da90b159b146672f830bcd2489dd3a1f4e9e089:
> > > > "aim7.jobs-per-min": [
> > > > 40239.65,
> > > > 40163.33,
> > > > 40353.32,
> > > > 39976.9,
> > > > 40185.75,
> > > > 40411.3,
> > > > 40213.58,
> > > > 39900.69
> > > > ],
> > > >
> > > > Any other test data you may need?
> > > >
> > > > >
> > > > > Is IMA involved here at all? I didn't see any evidence of it, but the
> > > > > kernel config did have it enabled.
> > > > >
> > > >
> > > > Sorry, not quite familiar with IMA, could you tell more about how to check it?
> > > >
> > >
> > > Thanks for retesting it, but I'm at a loss for why we're seeing this:
> > >
> > > IMA is the the integrity management subsystem. It will use the iversion
> > > field to determine whether to remeasure files during remeasurement. It
> > > looks like the kernel config has it enabled, but it doesn't look like
> > > it's in use, based on the info in the initial report.
> > >
> > > This patch only affects two inlined functions inode_cmp_iversion and
> > > inode_cmp_iversion_raw. The patch is pretty trivial (as Linus points
> > > out). These functions are only called from IMA and fs-specific code
> > > (usually in readdir implementations to detect directory changes).
> > >
> > > XFS does not call either of these functions however, so I'm a little
> > > unclear on how this patch could slow anything down on this test. The
> > > only thing I can think to do here would be to profile this and see what
> > > stands out.
> > >
> > > Note that we do need to keep this in perspective too. This 18%
> > > regression on this test follows around a ~230% improvement that occurred
> > > when we merged the bulk of these patches. It's should still be quite a
> > > bit faster than the v4.15 in this regard.
> > >
> > > Still, it'd be good to understand what's going on here.
> > >
> > >
> >
> > Could we see the dmesg from this boot? It'd be good to confirm that IMA
> > is not involved here, as that's the only place that I can see that would
> > call into this code at all here.
> >
>
> See attachment for info on dmesg/perf-profile/compare_result.
> Feel free to let Xiaolong or me know if anything else you would like to check.
>
Many thanks,
Only one caller of the functions touched by this patch shows up in the
profiles: ima_file_free. That calls ima_check_last_writer, which calls
inode_cmp_iversion. The lines from the profiles show:
3da90b159b146672f830bcd2489dd3a1f4e9e089:
0.00% 0.00% [kernel.kallsyms] [k] ima_file_free
c0cef30e4ff0dc025f4a1660b8f0ba43ed58426e:
0.01% 0.01% [kernel.kallsyms] [k] ima_file_free
Seems pretty insignificant, but perhaps that is somehow accounting for
the difference. This is called when a file is freed so there could be an
effect I guess if we're closing a lot of files for write.
Looking at the disassembly from the builds on my box there is some
slight difference, since we did alter the implementation.
inode->iversion is 0x150 bytes into the struct on my builds:
3da90b159b146672f830bcd2489dd3a1f4e9e089:
0xffffffff813ae858 <+136>: je 0xffffffff813ae871 <ima_file_free+161>
0xffffffff813ae85a <+138>: mov 0x150(%rbp),%rsi
0xffffffff813ae861 <+145>: mov 0x20(%rax),%rcx
0xffffffff813ae865 <+149>: and $0xfffffffffffffffe,%rsi
0xffffffff813ae869 <+153>: add %rcx,%rcx
0xffffffff813ae86c <+156>: cmp %rcx,%rsi
0xffffffff813ae86f <+159>: je 0xffffffff813ae899 <ima_file_free+201>
c0cef30e4ff0dc025f4a1660b8f0ba43ed58426e:
0xffffffff813ae828 <+136>: je 0xffffffff813ae83a <ima_file_free+154>
0xffffffff813ae82a <+138>: mov 0x150(%rbp),%rcx
0xffffffff813ae831 <+145>: shr %rcx
0xffffffff813ae834 <+148>: cmp %rcx,0x20(%rax)
0xffffffff813ae838 <+152>: je 0xffffffff813ae862 <ima_file_free+194>
The patched one looks like it ought to be more efficient. It's certainly
fewer instructions and it doesn't touch %rsi now. Are shifts more
expensive?
In any case, what might be good at this point as a sanity check is to
turn off CONFIG_IMA and recheck this. That should tell us whether we're
on the right track here. With that disabled, this patch should have no
effect on anything at all.
Thanks,
--
Jeff Layton <jlayton@redhat.com>
next prev parent reply other threads:[~2018-02-27 13:29 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-02-25 15:05 [lkp-robot] [iversion] c0cef30e4f: aim7.jobs-per-min -18.0% regression kernel test robot
2018-02-25 15:05 ` kernel test robot
2018-02-25 15:41 ` Jeff Layton
2018-02-25 15:41 ` Jeff Layton
2018-02-26 8:38 ` Ye Xiaolong
2018-02-26 8:38 ` Ye Xiaolong
2018-02-26 11:43 ` Jeff Layton
2018-02-26 11:43 ` Jeff Layton
2018-02-26 12:33 ` Jeff Layton
2018-02-26 12:33 ` Jeff Layton
2018-02-27 7:42 ` kemi
2018-02-27 7:42 ` [LKP] " kemi
2018-02-27 13:29 ` Jeff Layton [this message]
2018-02-27 13:29 ` Jeff Layton
2018-02-27 13:43 ` David Howells
2018-02-27 13:43 ` [LKP] " David Howells
2018-02-27 15:27 ` Jeff Layton
2018-02-27 15:27 ` [LKP] " Jeff Layton
2018-02-27 17:04 ` Linus Torvalds
2018-02-27 17:04 ` [LKP] " Linus Torvalds
2018-03-02 5:54 ` kemi
2018-03-02 5:54 ` [LKP] " kemi
2018-03-15 7:33 ` kemi
2018-03-15 7:33 ` [LKP] " kemi
2018-03-15 17:46 ` Linus Torvalds
2018-03-15 17:46 ` [LKP] " Linus Torvalds
2018-02-25 21:18 ` Linus Torvalds
2018-02-25 21:18 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1519738149.4300.45.camel@redhat.com \
--to=jlayton@redhat.com \
--cc=lkp@lists.01.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.