* xfs_release lock contention
@ 2024-08-07 4:27 Mateusz Guzik
2024-08-07 5:43 ` Dave Chinner
0 siblings, 1 reply; 4+ messages in thread
From: Mateusz Guzik @ 2024-08-07 4:27 UTC (permalink / raw)
To: david; +Cc: linux-xfs
I'm looking at false-sharing problems concerning multicore open + read +
close cycle on one inode and during my survey I found xfs is heavily
serializing on a spinlock in xfs_release, making it perform the worst
out of the btrfs/ext4/xfs trio.
A trivial test case plopped into will-it-scale is at the end.
bpftrace -e 'kprobe:__pv_queued_spin_lock_slowpath { @[kstack()] = count(); }' tells me:
[snip]
@[
__pv_queued_spin_lock_slowpath+5
_raw_spin_lock_irqsave+49
rwsem_wake.isra.0+57
up_write+69
xfs_iunlock+244
xfs_release+175
__fput+238
__x64_sys_close+60
do_syscall_64+82
entry_SYSCALL_64_after_hwframe+118
]: 41132
@[
__pv_queued_spin_lock_slowpath+5
_raw_spin_lock_irq+42
rwsem_down_read_slowpath+164
down_read+72
xfs_ilock+125
xfs_file_buffered_read+71
xfs_file_read_iter+115
vfs_read+604
ksys_read+103
do_syscall_64+82
entry_SYSCALL_64_after_hwframe+118
]: 137639
@[
__pv_queued_spin_lock_slowpath+5
_raw_spin_lock+41
xfs_release+196
__fput+238
__x64_sys_close+60
do_syscall_64+82
entry_SYSCALL_64_after_hwframe+118
]: 1432766
The xfs_release -> _raw_spin_lock thing is the XFS_ITRUNCATED flag test.
Also note how eofblock code inducing write trylock gets in the way of
doing the read (first 2 stacks).
General note is that for most files real files there are no "blocks past
eof" or otherwise truncation going on and there is presumably a way to
locklessly handle that, which should also reduce single-threaded
overhead.
For testing purposes I wrote a total hack which merely branches on
i_flags and i_delayed_blks == 0, but I have no idea if that's any good
-- *something* definitely is doable here, I leave that to people who
know the fs.
When running a kernel with a change whacking one lockref cycle on open:
https://lore.kernel.org/linux-fsdevel/20240806163256.882140-1-mjguzik@gmail.com/T/#u
... and toggling the short-circuit outlined above I'm seeing +50% ops/s
and going above btrfs. Then top of the profile is the false sharing I'm
looking at with other filesystems.
So that would be my report. Whatever you guys do with it is your
business, I'm bailing to the false-sharing problem.
Should you address this in a committable manner, there is no need to
credit or cc me.
Cheers.
the not patch:
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 7dc6f326936c..1cc62c21e709 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1079,6 +1079,8 @@ xfs_itruncate_extents_flags(
return error;
}
+extern unsigned long magic_tunable;
+
int
xfs_release(
xfs_inode_t *ip)
@@ -1089,6 +1091,13 @@ xfs_release(
if (!S_ISREG(VFS_I(ip)->i_mode) || (VFS_I(ip)->i_mode == 0))
return 0;
+ if (magic_tunable) {
+ if (!(ip->i_flags & XFS_ITRUNCATED))
+ return 0;
+ if (ip->i_delayed_blks == 0)
+ return 0;
+ }
+
/* If this is a read-only mount, don't do this (would generate I/O) */
if (xfs_is_readonly(mp))
return 0;
test case (plop into will-it-scale, say tests/openreadclose3.c and run
./openreadclose3_processes -t 24):
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <assert.h>
#define BUFSIZE 1024
static char tmpfile[] = "/tmp/willitscale.XXXXXX";
char *testcase_description = "Same file open/read/close";
void testcase_prepare(unsigned long nr_tasks)
{
char buf[BUFSIZE];
int fd = mkstemp(tmpfile);
assert(fd >= 0);
memset(buf, 'A', sizeof(buf));
assert(write(fd, buf, sizeof(buf)) == sizeof(buf));
close(fd);
}
void testcase(unsigned long long *iterations, unsigned long nr)
{
char buf[BUFSIZE];
while (1) {
int fd = open(tmpfile, O_RDONLY);
assert(fd >= 0);
assert(read(fd, buf, sizeof(buf)) == sizeof(buf));
close(fd);
(*iterations)++;
}
}
void testcase_cleanup(void)
{
unlink(tmpfile);
}
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: xfs_release lock contention
2024-08-07 4:27 xfs_release lock contention Mateusz Guzik
@ 2024-08-07 5:43 ` Dave Chinner
2024-08-07 14:37 ` Christoph Hellwig
0 siblings, 1 reply; 4+ messages in thread
From: Dave Chinner @ 2024-08-07 5:43 UTC (permalink / raw)
To: Mateusz Guzik; +Cc: linux-xfs, hch
On Wed, Aug 07, 2024 at 06:27:21AM +0200, Mateusz Guzik wrote:
> I'm looking at false-sharing problems concerning multicore open + read +
> close cycle on one inode and during my survey I found xfs is heavily
> serializing on a spinlock in xfs_release, making it perform the worst
> out of the btrfs/ext4/xfs trio.
>
> A trivial test case plopped into will-it-scale is at the end.
>
> bpftrace -e 'kprobe:__pv_queued_spin_lock_slowpath { @[kstack()] = count(); }' tells me:
> [snip]
> @[
> __pv_queued_spin_lock_slowpath+5
> _raw_spin_lock_irqsave+49
> rwsem_wake.isra.0+57
> up_write+69
> xfs_iunlock+244
> xfs_release+175
> __fput+238
> __x64_sys_close+60
> do_syscall_64+82
> entry_SYSCALL_64_after_hwframe+118
> ]: 41132
> @[
> __pv_queued_spin_lock_slowpath+5
> _raw_spin_lock_irq+42
> rwsem_down_read_slowpath+164
> down_read+72
> xfs_ilock+125
> xfs_file_buffered_read+71
> xfs_file_read_iter+115
> vfs_read+604
> ksys_read+103
> do_syscall_64+82
> entry_SYSCALL_64_after_hwframe+118
> ]: 137639
> @[
> __pv_queued_spin_lock_slowpath+5
> _raw_spin_lock+41
> xfs_release+196
> __fput+238
> __x64_sys_close+60
> do_syscall_64+82
> entry_SYSCALL_64_after_hwframe+118
> ]: 1432766
>
> The xfs_release -> _raw_spin_lock thing is the XFS_ITRUNCATED flag test.
Yeah, these all ring old bells in the back of my skull.
>
> test case (plop into will-it-scale, say tests/openreadclose3.c and run
> ./openreadclose3_processes -t 24):
>
> #include <stdlib.h>
> #include <unistd.h>
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <assert.h>
>
> #define BUFSIZE 1024
>
> static char tmpfile[] = "/tmp/willitscale.XXXXXX";
>
> char *testcase_description = "Same file open/read/close";
>
> void testcase_prepare(unsigned long nr_tasks)
> {
> char buf[BUFSIZE];
> int fd = mkstemp(tmpfile);
>
> assert(fd >= 0);
> memset(buf, 'A', sizeof(buf));
> assert(write(fd, buf, sizeof(buf)) == sizeof(buf));
> close(fd);
> }
>
> void testcase(unsigned long long *iterations, unsigned long nr)
> {
> char buf[BUFSIZE];
>
> while (1) {
> int fd = open(tmpfile, O_RDONLY);
> assert(fd >= 0);
> assert(read(fd, buf, sizeof(buf)) == sizeof(buf));
> close(fd);
Oh, yeah, I defintely sent patches once upon a time to address
this.
<scrummage around old patch stacks>
Yep, there it is:
https://lore.kernel.org/linux-xfs/20190207050813.24271-4-david@fromorbit.com/
This would completely remove the rwsem traffic from O_RDONLY file
closes.
None of it would address the XFS_ITRUNCATED contention issue, but
that's just another of those "test, test-and-clear" cases that avoid
the atomic ops by testing if the bit is set without the lock
first....
Hmmm, I thought I saw these patches go past on the list again
recently. Yeah, that was a coupl eof months ago:
https://lore.kernel.org/linux-xfs/20240623053532.857496-1-hch@lst.de/
Christoph, any progress on merging that patchset?
-Dave.
--
Dave Chinner
david@fromorbit.com
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: xfs_release lock contention
2024-08-07 5:43 ` Dave Chinner
@ 2024-08-07 14:37 ` Christoph Hellwig
2024-08-07 14:56 ` Darrick J. Wong
0 siblings, 1 reply; 4+ messages in thread
From: Christoph Hellwig @ 2024-08-07 14:37 UTC (permalink / raw)
To: Dave Chinner; +Cc: Mateusz Guzik, linux-xfs, hch
On Wed, Aug 07, 2024 at 03:43:53PM +1000, Dave Chinner wrote:
> Hmmm, I thought I saw these patches go past on the list again
> recently. Yeah, that was a coupl eof months ago:
>
> https://lore.kernel.org/linux-xfs/20240623053532.857496-1-hch@lst.de/
>
> Christoph, any progress on merging that patchset?
I expected them to go into 6.11 but they didn't. I'll resend them
after retesting them as there weren't any actionable items except for
a few typo fixes.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: xfs_release lock contention
2024-08-07 14:37 ` Christoph Hellwig
@ 2024-08-07 14:56 ` Darrick J. Wong
0 siblings, 0 replies; 4+ messages in thread
From: Darrick J. Wong @ 2024-08-07 14:56 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Dave Chinner, Mateusz Guzik, linux-xfs
On Wed, Aug 07, 2024 at 04:37:59PM +0200, Christoph Hellwig wrote:
> On Wed, Aug 07, 2024 at 03:43:53PM +1000, Dave Chinner wrote:
> > Hmmm, I thought I saw these patches go past on the list again
> > recently. Yeah, that was a coupl eof months ago:
> >
> > https://lore.kernel.org/linux-xfs/20240623053532.857496-1-hch@lst.de/
> >
> > Christoph, any progress on merging that patchset?
>
> I expected them to go into 6.11 but they didn't. I'll resend them
> after retesting them as there weren't any actionable items except for
> a few typo fixes.
Heh, I think there's like one patch out of the whole bunch that didn't
get a RVB tag, so that's probably why it didn't go in. :(
--D
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-08-07 14:56 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-07 4:27 xfs_release lock contention Mateusz Guzik
2024-08-07 5:43 ` Dave Chinner
2024-08-07 14:37 ` Christoph Hellwig
2024-08-07 14:56 ` Darrick J. Wong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox