From: Jeff Layton <jlayton@kernel.org>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: kernel test robot <oliver.sang@intel.com>,
Mike Snitzer <snitzer@redhat.com>,
oe-lkp@lists.linux.dev, lkp@intel.com,
Anna Schumaker <anna@kernel.org>,
Trond Myklebust <trond.myklebust@primarydata.com>,
Thomas Haynes <loghyr@gmail.com>,
linux-nfs@vger.kernel.org
Subject: Re: [snitzer:cel-nfsd-next-6.12-rc5] [nfsd] b4be3ccf1c: fsmark.app_overhead 92.0% regression
Date: Wed, 13 Nov 2024 11:20:34 -0500 [thread overview]
Message-ID: <79e94b2225fa8e4a4723b8941eea79111f77c55f.camel@kernel.org> (raw)
In-Reply-To: <ZzTQd5NKe5WXkLlA@tissot.1015granger.net>
On Wed, 2024-11-13 at 11:14 -0500, Chuck Lever wrote:
> On Wed, Nov 13, 2024 at 11:10:35AM -0500, Jeff Layton wrote:
> > On Wed, 2024-11-13 at 10:48 -0500, Chuck Lever wrote:
> > > On Thu, Nov 07, 2024 at 06:35:11AM -0500, Jeff Layton wrote:
> > > > On Thu, 2024-11-07 at 12:55 +0800, kernel test robot wrote:
> > > > > hi, Jeff Layton,
> > > > >
> > > > > in commit message, it is mentioned the change is expected to solve the
> > > > > "App Overhead" on the fs_mark test we reported in
> > > > > https://lore.kernel.org/oe-lkp/202409161645.d44bced5-oliver.sang@intel.com/
> > > > >
> > > > > however, in our tests, there is sill similar regression. at the same
> > > > > time, there is still no performance difference for fsmark.files_per_sec
> > > > >
> > > > > 2015880 ± 3% +92.0% 3870164 fsmark.app_overhead
> > > > > 18.57 +0.0% 18.57 fsmark.files_per_sec
> > > > >
> > > > >
> > > > > another thing is our bot bisect to this commit in repo/branch as below detail
> > > > > information. if there is a more porper repo/branch to test the patch, could
> > > > > you let us know? thanks a lot!
> > > > >
> > > > >
> > > > >
> > > > > Hello,
> > > > >
> > > > > kernel test robot noticed a 92.0% regression of fsmark.app_overhead on:
> > > > >
> > > > >
> > > > > commit: b4be3ccf1c251cbd3a3cf5391a80fe3a5f6f075e ("nfsd: implement OPEN_ARGS_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION")
> > > > > https://git.kernel.org/cgit/linux/kernel/git/snitzer/linux.git cel-nfsd-next-6.12-rc5
> > > > >
> > > > >
> > > > > testcase: fsmark
> > > > > config: x86_64-rhel-8.3
> > > > > compiler: gcc-12
> > > > > test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory
> > > > > parameters:
> > > > >
> > > > > iterations: 1x
> > > > > nr_threads: 1t
> > > > > disk: 1HDD
> > > > > fs: btrfs
> > > > > fs2: nfsv4
> > > > > filesize: 4K
> > > > > test_size: 40M
> > > > > sync_method: fsyncBeforeClose
> > > > > nr_files_per_directory: 1fpd
> > > > > cpufreq_governor: performance
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > > > > the same patch/commit), kindly add following tags
> > > > > > Reported-by: kernel test robot <oliver.sang@intel.com>
> > > > > > Closes: https://lore.kernel.org/oe-lkp/202411071017.ddd9e9e2-oliver.sang@intel.com
> > > > >
> > > > >
> > > > > Details are as below:
> > > > > -------------------------------------------------------------------------------------------------->
> > > > >
> > > > >
> > > > > The kernel config and materials to reproduce are available at:
> > > > > https://download.01.org/0day-ci/archive/20241107/202411071017.ddd9e9e2-oliver.sang@intel.com
> > > > >
> > > > > =========================================================================================
> > > > > compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconfig/nr_files_per_directory/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase:
> > > > > gcc-12/performance/1HDD/4K/nfsv4/btrfs/1x/x86_64-rhel-8.3/1fpd/1t/debian-12-x86_64-20240206.cgz/fsyncBeforeClose/lkp-icl-2sp6/40M/fsmark
> > > > >
> > > > > commit:
> > > > > 37f27b20cd ("nfsd: add support for FATTR4_OPEN_ARGUMENTS")
> > > > > b4be3ccf1c ("nfsd: implement OPEN_ARGS_SHARE_ACCESS_WANT_OPEN_XOR_DELEGATION")
> > > > >
> > > > > 37f27b20cd64e2e0 b4be3ccf1c251cbd3a3cf5391a8
> > > > > ---------------- ---------------------------
> > > > > %stddev %change %stddev
> > > > > \ | \
> > > > > 97.33 ± 9% -16.3% 81.50 ± 9% perf-c2c.HITM.local
> > > > > 3788 ±101% +147.5% 9377 ± 6% sched_debug.cfs_rq:/.load_avg.max
> > > > > 2936 -6.2% 2755 vmstat.system.cs
> > > > > 2015880 ± 3% +92.0% 3870164 fsmark.app_overhead
> > > > > 18.57 +0.0% 18.57 fsmark.files_per_sec
> > > > > 53420 -17.3% 44185 fsmark.time.voluntary_context_switches
> > > > > 1.50 ± 7% +13.4% 1.70 ± 3% perf-sched.wait_time.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
> > > > > 3.00 ± 7% +13.4% 3.40 ± 3% perf-sched.wait_time.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
> > > > > 1756957 -2.1% 1720536 proc-vmstat.numa_hit
> > > > > 1624496 -2.2% 1588039 proc-vmstat.numa_local
> > > > > 1.28 ± 4% -8.2% 1.17 ± 3% perf-stat.i.MPKI
> > > > > 2916 -6.2% 2735 perf-stat.i.context-switches
> > > > > 1529 ± 4% +8.2% 1655 ± 3% perf-stat.i.cycles-between-cache-misses
> > > > > 2910 -6.2% 2729 perf-stat.ps.context-switches
> > > > > 0.67 ± 15% -0.4 0.27 ±100% perf-profile.calltrace.cycles-pp._Fork
> > > > > 0.95 ± 15% +0.3 1.26 ± 11% perf-profile.calltrace.cycles-pp.__sched_setaffinity.sched_setaffinity.__x64_sys_sched_setaffinity.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > > > > 0.70 ± 47% +0.3 1.04 ± 9% perf-profile.calltrace.cycles-pp._nohz_idle_balance.do_idle.cpu_startup_entry.start_secondary.common_startup_64
> > > > > 0.52 ± 45% +0.3 0.86 ± 15% perf-profile.calltrace.cycles-pp.seq_read_iter.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe
> > > > > 0.72 ± 50% +0.4 1.12 ± 18% perf-profile.calltrace.cycles-pp.link_path_walk.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat
> > > > > 1.22 ± 26% +0.4 1.67 ± 12% perf-profile.calltrace.cycles-pp.sched_setaffinity.evlist_cpu_iterator__next.read_counters.process_interval.dispatch_events
> > > > > 2.20 ± 11% +0.6 2.78 ± 13% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
> > > > > 2.20 ± 11% +0.6 2.82 ± 12% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.read
> > > > > 2.03 ± 13% +0.6 2.67 ± 13% perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read
> > > > > 0.68 ± 15% -0.2 0.47 ± 19% perf-profile.children.cycles-pp._Fork
> > > > > 0.56 ± 15% -0.1 0.42 ± 16% perf-profile.children.cycles-pp.tcp_v6_do_rcv
> > > > > 0.46 ± 13% -0.1 0.35 ± 16% perf-profile.children.cycles-pp.alloc_pages_mpol_noprof
> > > > > 0.10 ± 75% +0.2 0.29 ± 19% perf-profile.children.cycles-pp.refresh_cpu_vm_stats
> > > > > 0.28 ± 33% +0.2 0.47 ± 16% perf-profile.children.cycles-pp.show_stat
> > > > > 0.34 ± 32% +0.2 0.54 ± 16% perf-profile.children.cycles-pp.fold_vm_numa_events
> > > > > 0.37 ± 11% +0.3 0.63 ± 23% perf-profile.children.cycles-pp.setup_items_for_insert
> > > > > 0.88 ± 15% +0.3 1.16 ± 12% perf-profile.children.cycles-pp.__set_cpus_allowed_ptr
> > > > > 0.37 ± 14% +0.3 0.67 ± 61% perf-profile.children.cycles-pp.btrfs_writepages
> > > > > 0.37 ± 14% +0.3 0.67 ± 61% perf-profile.children.cycles-pp.extent_write_cache_pages
> > > > > 0.64 ± 19% +0.3 0.94 ± 15% perf-profile.children.cycles-pp.btrfs_insert_empty_items
> > > > > 0.38 ± 12% +0.3 0.68 ± 58% perf-profile.children.cycles-pp.btrfs_fdatawrite_range
> > > > > 0.32 ± 23% +0.3 0.63 ± 64% perf-profile.children.cycles-pp.extent_writepage
> > > > > 0.97 ± 14% +0.3 1.31 ± 10% perf-profile.children.cycles-pp.__sched_setaffinity
> > > > > 1.07 ± 16% +0.4 1.44 ± 10% perf-profile.children.cycles-pp.__x64_sys_sched_setaffinity
> > > > > 1.39 ± 18% +0.5 1.90 ± 12% perf-profile.children.cycles-pp.seq_read_iter
> > > > > 0.34 ± 30% +0.2 0.52 ± 16% perf-profile.self.cycles-pp.fold_vm_numa_events
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Disclaimer:
> > > > > Results have been estimated based on internal Intel analysis and are provided
> > > > > for informational purposes only. Any difference in system hardware or software
> > > > > design or configuration may affect actual performance.
> > > > >
> > > > >
> > > >
> > > > This patch (b4be3ccf1c) is exceedingly simple, so I doubt it's causing
> > > > a performance regression in the server. The only thing I can figure
> > > > here is that this test is causing the server to recall the delegation
> > > > that it hands out, and then the client has to go and establish a new
> > > > open stateid in order to return it. That would likely be slower than
> > > > just handing out both an open and delegation stateid in the first
> > > > place.
> > > >
> > > > I don't think there is anything we can do about that though, since the
> > > > feature seems to is working as designed.
> > >
> > > We seem to have hit this problem before. That makes me wonder
> > > whether it is actually worth supporting the XOR flag at all. After
> > > all, the client sends the CLOSE asynchronously; applications are not
> > > being held up while the unneeded state ID is returned.
> > >
> > > Can XOR support be disabled for now? I don't want to add an
> > > administrative interface for that, but also, "no regressions" is
> > > ringing in my ears, and 92% is a mighty noticeable one.
> >
> > To be clear, this increase is for the "App Overhead" which is all of
> > the operations between the stuff that is being measured in this test. I
> > did run this test for a bit and got similar results, but was never able
> > to nail down where the overhead came from. My speculation is that it's
> > the recall and reestablishment of an open stateid that slows things
> > down, but I never could fully confirm it.
> >
> > My issue with disabling this is that the decision of whether to set
> > OPEN_XOR_DELEGATION is up to the client. The server in this case is
> > just doing what the client asks. ISTM that if we were going to disable
> > (or throttle) this anywhere, that should be done by the client.
>
> If the client sets the XOR flag, NFSD could simply return only an
> OPEN stateid, for instance.
>
>
Sure, but then it's disabled for everyone, for every workload. There
may be workloads the benefit too. What would really be nice is another
datapoint. To the HS guys:
If you run this test against the Anvil both with and without
OPEN_XOR_DELEGATION enabled, do you also see an increase in the "App
Overhead" with this enabled? It would be nice to know if this effect is
common among servers that implement this, or whether it's something
particular to knfsd.
--
Jeff Layton <jlayton@kernel.org>
next prev parent reply other threads:[~2024-11-13 16:20 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <202411071017.ddd9e9e2-oliver.sang@intel.com>
[not found] ` <4d5d966b2efc0b5b8c59ef179e99f3f8fbf792f8.camel@kernel.org>
[not found] ` <ZzTKQGYJFh7PH4Fw@tissot.1015granger.net>
2024-11-13 16:10 ` [snitzer:cel-nfsd-next-6.12-rc5] [nfsd] b4be3ccf1c: fsmark.app_overhead 92.0% regression Jeff Layton
2024-11-13 16:14 ` Chuck Lever
2024-11-13 16:20 ` Jeff Layton [this message]
2024-11-13 16:37 ` Chuck Lever III
2024-11-13 17:02 ` Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=79e94b2225fa8e4a4723b8941eea79111f77c55f.camel@kernel.org \
--to=jlayton@kernel.org \
--cc=anna@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=linux-nfs@vger.kernel.org \
--cc=lkp@intel.com \
--cc=loghyr@gmail.com \
--cc=oe-lkp@lists.linux.dev \
--cc=oliver.sang@intel.com \
--cc=snitzer@redhat.com \
--cc=trond.myklebust@primarydata.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox